Scorers are passed to a
weave.Evaluation
object during evaluation. There are two types of Scorers in weave:- Function-based Scorers: Simple Python functions decorated with
@weave.op
. - Class-based Scorers: Python classes that inherit from
weave.Scorer
for more complex evaluations.
Create your own Scorers
Ready-to-Use Scorers
While this guide shows you how to create custom scorers, Weave comes with a variety of predefined scorers and local SLM scorers that you can use right away, including:
Function-based Scorers
These are functions decorated with When the evaluation is run,
@weave.op
that return a dictionary. They’re great for simple evaluations like:evaluate_uppercase
checks if the text is all uppercase.Class-based Scorers
For more advanced evaluations, especially when you need to keep track of additional scorer metadata, try different prompts for your LLM-evaluators, or make multiple function calls, you can use the This class evaluates how good a summary is by comparing it to the original text.
Scorer
class.Requirements:- Inherit from
weave.Scorer
. - Define a
score
method decorated with@weave.op
. - The
score
method must return a dictionary.
How Scorers Work
Scorer Keyword Arguments
Scorers can access both the output from your AI system and the input data from the dataset row.When a weave Mapping Column Names with
Sometimes, the Now, the
- Input: If you would like your scorer to use data from your dataset row, such as a “label” or “target” column then you can easily make this available to the scorer by adding a
label
ortarget
keyword argument to your scorer definition.
score
class method) would have a parameter list like this:Evaluation
is run, the output of the AI system is passed to the output
parameter. The Evaluation
also automatically tries to match any additional scorer argument names to your dataset columns. If customizing your scorer arguments or dataset columns is not feasible, you can use column mapping - see below for more.- Output: Include an
output
parameter in your scorer function’s signature to access the AI system’s output.
Mapping Column Names with column_map
Sometimes, the score
methods’ argument names don’t match the column names in your dataset. You can fix this using a column_map
.If you’re using a class-based scorer, pass a dictionary to the column_map
attribute of Scorer
when you initialise your scorer class. This dictionary maps your score
method’s argument names to the dataset’s column names, in the order: {scorer_keyword_argument: dataset_column_name}
.Example:text
argument in the score
method will receive data from the news_article
dataset column.Notes:- Another equivalent option to map your columns is to subclass the
Scorer
and overload thescore
method mapping the columns explicitly.
Final summarization of the scorer
During evaluation, the scorer will be computed for each row of your dataset. To provide a final score for the evaluation we provide an
auto_summarize
depending on the returning type of the output.- Averages are computed for numerical columns
- Count and fraction for boolean columns
- Other column types are ignored
summarize
method on the Scorer
class and provide your own way of computing the final scores. The summarize
function expects:- A single parameter
score_rows
: This is a list of dictionaries, where each dictionary contains the scores returned by thescore
method for a single row of your dataset. - It should return a dictionary containing the summarized scores.
In this example, the default auto_summarize
would have returned the count and proportion of True.
If you want to learn more, check the implementation of CorrectnessLLMJudge.Applying Scorers to a Call
To apply scorers to your Weave ops, you’ll need to use the.call()
method which provides access to both the operation’s result and its tracking information. This allows you to associate scorer results with specific calls in Weave’s database.
For more information on how to use the .call()
method, see the Calling Ops guide.
Here’s a basic example:You can also apply multiple scorers to the same call:Notes:
- Scorer results are automatically stored in Weave’s database
- Scorers run asynchronously after the main operation completes
- You can view scorer results in the UI or query them via the API
Use preprocess_model_input
You can use the preprocess_model_input
parameter to modify dataset examples before they reach your model during evaluation.
For usage information and an example, see Using preprocess_model_input
to format dataset rows before evaluating.
Score Analysis
In this section, we’ll show you how to analyze the scores for a single call, multiple calls, and all calls scored by a specific scorer.Analyze a single Call’s Scores
Single Call API
To retrieve the calls for a single call, you can use theget_call
method.
Single Call UI

Analyze multiple Calls’ Scores
Multiple Calls API
To retrieve the calls for multiple calls, you can use theget_calls
method.
Multiple Calls UI

Analyze all Calls scored by a specific Scorer
All Calls by Scorer API
To retrieve all calls scored by a specific scorer, you can use theget_calls
method.
All Calls by Scorer UI
Finally, if you would like to see all the calls scored by a Scorer, navigate to the Scorers Tab in the UI and select “Programmatic Scorer” tab. Click your Scorer to open the Scorer details page.
View Traces
button under Scores
to view all the calls scored by your Scorer.

